2019年2月bioRxiv生信好文速览
不知道大家有没有思考过这个问题:bioRxiv 这个词的该怎样读呢?小编一开始是读做“bio Xiv”。虽然好像也是可以的,但其正确读音应是“bio archive”(/ˈɑː.kaɪv/),也就是生物学档案的意思。我们借这个机会再为对预印本(preprint)可能不太了解的朋友介绍一下bioRxiv。照官方说法:bioRxiv是一个生命科学在线预印本收集站,由著名的冷泉港实验室管理,是一个非盈利的科研和教育平台。其中的大部分结果都在后来都被发表【1-3】。换言之,就是绕过繁琐的审稿环节,在发表前先登出来——当然,也会以损失同行评议过程中的严谨性为代价。archive这个词,听起来着实有一些“老旧”的感觉,让大家以为bioRxiv的文章都是一些尘封档案。恰恰相反,在bioRxiv可以获取到科学最前沿的咨询,这一平台也给世界各地的研究人员提供了沟通、合作的快捷渠道。现在,尤其是生信领域,以bioRxiv为代表的预印本服务器已受到越来越多学者的青睐【1-3】。下面带来我们为大家挑选出的二月bioRxiv生信好文十篇,其中包括两组背靠背(back-to-back)的文章哦。
1. 背靠背
1.1【Sequencing】BGI vs Illumina,单细胞测序谁更胜一筹?不同的成本,相似的表现
Comparative performance of the BGI and Illumina sequencing technology for single-cell RNA-sequencing(CC-BY-ND 4.0)
The libraries generated by high-throughput single cell RNA-sequencing platforms such as the Chromium from 10X Genomics require considerable amounts of sequencing, typically due to the large number of cells. The ability to use this data to address biological questions is directly impacted by the quality of the sequence data. Here we have compared the performance of the Illumina NextSeq 500 and NovaSeq 6000 against the BGI MGISEQ-2000 platform using identical Single Cell libraries consisting of over 70,000 cells. Our results demonstrate a highly comparable performance between the NovaSeq 6000 and MGISEQ-2000 in sequencing quality, and cell, UMI, and gene detection. However, compared with the NextSeq 500, the MGISEQ- 2000 platform performs consistently better, identifying more cells, genes, and UMIs at equalised read depth. We were able to call an additional 1,065,659 SNPs from sequence data generated by the BGI platform, enabling an additional 14% of cells to be assigned to the correct donor from a multiplexed library. However, both the NextSeq 500 and MGISEQ-2000 detected similar frequencies of gRNAs from a pooled CRISPR single cell screen. Our study provides a benchmark for high capacity sequencing platforms applied to high-throughput single cell RNA-seq libraries.
原文fig 1
背靠背:去年11月剑桥桑格研究所Teichmann实验室有类似文章刊于bioRxiv
Comparative analysis of sequencing technologies platforms for single-cell transcriptomics
1.2 【Genome editing】原核生物Ago再起波澜?美俄学者在细菌中发现新Ago可在55℃工作
Programmable DNA cleavage by Ago nucleases from mesophilic bacteria Clostridium butyricum and Limnothrix rosea(CC-BY-NC-ND 4.0)
Argonaute (Ago) proteins are the key players in RNA interference in eukaryotes, where they function as RNA-guided RNA endonucleases. Prokaryotic Argonautes (pAgos) are much more diverse than their eukaryotic counterparts but their cellular functions and mechanisms of action remain largely unknown. Some pAgos were shown to use small DNA guides for endonucleolytic cleave of complementary DNA in vitro. However, previously studied pAgos from thermophilic prokaryotes function at elevated temperatures which limits their potential use as a tool in genomic applications. Here, we describe two pAgos from mesophilic bacteria, Clostridium butyricum (CbAgo) and Limnothrix rosea (LrAgo), that act as DNA-guided DNA nucleases at physiological temperatures. In contrast to previously studied pAgos, CbAgo and LrAgo can use not only 5'-phosphorylated but also 5'-hydroxyl DNA guides, with diminished precision of target cleavage. Both LrAgo and CbAgo can tolerate guide/target mismatches in the seed region, but are sensitive to mismatches in the 3'-guide region. CbAgo is highly active under a wide range of conditions and can be used for programmable endonucleolytic cleavage of both single-stranded and double-stranded DNA substrates at moderate temperatures. The biochemical characterization of mesophilic pAgo proteins paths the way for their use for DNA manipulations both in vitro and in vivo.
背靠背:本文有一篇背靠背相关文章1月29日在bioRxiv刊出并于2月25日更新,该文由荷兰瓦赫宁根大学的著名学者John van der Oost团队完成:DNA-guided DNA cleavage at moderate temperatures by Clostridium butyricum Argonaute
2.【Genomics】玩概念?植物学三巨头强强联手群体基因组研究提出pan-NLRome
The Arabidopsis thaliana pan-NLRome(CC-BY 4.0)
Disease is both among the most important selection pressures in nature and among the main causes of yield loss in agriculture. In plants, resistance to disease is often conferred by Nucleotide-binding Leucine-rich Repeat (NLR) proteins. These proteins function as intracellular immune receptors that recognize pathogen proteins and their effects on the plant. Consistent with evolutionarily dynamic interactions between plants and pathogens, NLRs are known to be encoded by one of the most variable gene families in plants, but the true extent of intraspecific NLR diversity has been unclear. Here, we define the majority of the Arabidopsis thaliana species-wide 'NLRome'. From NLR sequence enrichment and long-read sequencing of 65 diverse A. thaliana accessions, we infer that the pan-NLRome saturates with approximately 40 accessions. Despite the high diversity of NLRs, half of the pan-NLRome is present in most accessions. We chart the architectural diversity of NLR proteins, identify novel architectures, and quantify the selective forces that act on specific NLRs, domains, and positions. Our study provides a blueprint for defining the pan-NLRome of plant species.
3.【Genomics】世界最大蜥蜴科莫多龙基因组测序揭秘其环境适应奥秘(CC-BY-NC-ND 4.0)
原文Fig 1A
Monitor lizards are unique among ectothermic reptiles in that they have a high aerobic capacity and distinctive cardiovascular physiology which resembles that of endothermic mammals. We have sequenced the genome of the Komodo dragon (Varanus komodoensis), the largest extant monitor lizard, and present a high resolution de novo chromosome-assigned genome assembly for V. komodoensis, generated with a hybrid approach of long-range sequencing and single molecule physical mapping. Comparing the genome of V. komodoensis with those of related species showed evidence of positive selection in pathways related to muscle energy metabolism, cardiovascular homeostasis, and thrombosis. We also found species-specific expansions of a chemoreceptor gene family related to pheromone and kairomone sensing in V. komodoensis and several other lizard lineages. Together, these evolutionary signatures of adaptation reveal genetic underpinnings of the unique Komodo sensory, cardiovascular, and muscular systems, and suggest that selective pressure altered thrombosis genes to help Komodo dragons evade the anticoagulant effects of their own saliva. As the only sequenced monitor lizard genome, the Komodo dragon genome is an important resource for understanding the biology of this lineage and of reptiles worldwide.
4.【Genomics】从土壤线虫到人体组织:70个动物样本中存在多少未知病毒?看测序揭晓答案
Discovery of several thousand highly diverse circular DNA viruses(CC0)
Virologists have posited the existence of millions of distinct viral species, but fewer than 9000 viral species are catalogued in GenBank's RefSeq database. We selectively enriched for and amplified the genomes of circular DNA viruses in over 70 animal samples, ranging from cultured soil nematodes to human tissue specimens. Over 2500 complete circular genomes, each representing a new viral taxon, were assembled, thoroughly annotated, and deposited in GenBank. The new genomes belong to dozens of established and emerging viral families. Some of the genomes have unexpected gene compositions that appear to be the result of recombination between ssDNA viruses and ssRNA viruses. In addition, hundreds of circular DNA elements that do not encode any discernable similarities to previously characterized sequences were identified. To characterize these "dark matter" sequences, we used an artificial neural network to identify candidate viral capsid proteins and cell culture assays to validate that some predicted capsid proteins formed virus-like particles. These data further the understanding of viral sequence diversity and allow for more comprehensive analysis of the virosphere.
5.【Genomics】美加学者1300个胰岛细胞膜片钳+scRNA-seq带来对β-cell的新认识
Pancreas patch-seq links physiologic dysfunction in diabetes to single-cell transcriptomic phenotypes(CC-BY-NC-ND 4.0)
Pancreatic islet cells regulate glucose homeostasis through insulin and glucagon secretion; dysfunction of these cells leads to severe diseases like diabetes. Prior single-cell transcriptome studies have shown heterogeneous gene expression in major islet cell-types; however it remains challenging to reconcile this transcriptomic heterogeneity with observed islet cell functional variation. Here we achieved electrophysiological profiling and single-cell RNA sequencing in the same islet cell (pancreas patch-seq) thereby linking transcriptomic phenotypes to physiologic properties. We collected 1,369 cells from the pancreas of donors with or without diabetes and assessed function-gene expression networks. We identified a set of genes and pathways that drive functional heterogeneity in β-cells and used these to predict β-cell electrophysiology. We also report specific transcriptional programs that correlate with dysfunction in type 2 diabetes (T2D) and extend this approach to cryopreserved cells from donors with type 1 diabetes (T1D), generating a valuable resource for understanding islet cell heterogeneity in health and disease.
6. 【Genomics】UK Biobank数据挖掘指出大麻使用导致抑郁症和自残行为的更高概率
Cannabis use, depression and self-harm: phenotypic and genetic relationships
Findings: In UK Biobank, cannabis use is associated with increased likelihood of depression (OR=1.64, 95% CI=1.59-1.70, p=1.19x10-213) and self-harm (OR=2.85, 95% CI=2.69-3.01, p=3.46x10-304). The strength of this phenotypic association is stronger when more severe trait definitions of cannabis use and depression are considered. Additionally, significant genetic correlations are seen between cannabis use and depression using consortia summary statistics (rg=0.289, SE=0.036, p=1.45x10-15). Polygenic risk scores for cannabis use and depression both explain a small but significant proportion of variance in cannabis use, depression and self harm within a UK Biobank target sample. However, two-sample Mendelian randomisation analyses were not significant. Conclusions: Cannabis use is both phenotypically and genetically associated with depression and self harm. Future work dissecting the causal mechanism linking these traits may have implications for cannabis users.
7. 【Omics】EMBL科学家质谱研究描绘人类磷酸化蛋白质组学全景图
The functional landscape of the human phosphoproteome(CC-BY 4.0)
Protein phosphorylation is a key post-translational modification regulating protein function in almost all cellular processes. While tens of thousands of phosphorylation sites have been identified in human cells to date, the extent and functional importance of the phosphoproteome remains largely unknown. Here, we have analyzed 6,801 publicly available phospho-enriched mass spectrometry proteomics experiments, creating a state-of-the-art phosphoproteome containing 119,809 human phosphosites. To prioritize functional sites, 59 features indicative of proteomic, structural, regulatory or evolutionary relevance were integrated into a single functional score using machine learning. We demonstrate how this prioritization identifies regulatory phosphosites across different molecular mechanisms and pinpoint genetic susceptibilities at a genomic scale. Several novel regulatory phosphosites were experimentally validated including a role in neuronal differentiation for phosphosites present in the SWI/SNF SMARCC2 complex member. The scored reference phosphoproteome and its annotations identify the most relevant phosphorylations for a given process or disease addressing a major bottleneck in cell signaling studies.
8.【Omics】小鼠长大后转录组的差异在早年间的DNA甲基化中已埋下伏笔?
Early life DNA methylation profiles are indicative of age-related transcriptome changes(CC-BY-NC-ND 4.0)
Alterations to cellular and molecular programs with brain aging result in cognitive impairment and susceptibility to neurodegenerative disease. Changes in DNA methylation patterns, an epigenetic modification required for various CNS functions, are observed with aging and can be prevented by anti-aging interventions, but the functional outcomes of altered methylation on transcriptome profiles are poorly understood with brain aging. Integrated analysis of the hippocampal methylome and transcriptome with aging of male and female mice demonstrates that age-related differences in methylation and gene expression are anti-correlated within gene bodies and enhancers, but not promoters. Methylation levels at young age of genes altered with aging are positively associated with age-related expression changes even in the absence of significant changes to methylation with aging, a finding also observed in mouse Alzheimer's models. DNA methylation patterns established in youth, in combination with other epigenetic marks, are able to predict changes in transcript trajectories with aging. These findings are consistent with the developmental origins of disease hypothesis and indicate that epigenetic variability in early life may explain differences in age-related disease.
9. 【Bioinformatics】HH-suite新版本问世,声称快过PSI-BLAST 10倍
HH-suite3 for fast remote homology detection and deep protein annotation(CC-BY-NC-ND 4.0)
Background: HH-suite is a widely used open source software suite for sensitive sequence similarity searches and protein fold recognition. It is based on pairwise alignment of profile Hidden Markov models (HMMs), which represent multiple sequence alignments of homologous sequences. Results: We developed a single-instruction multiple-data (SIMD) vectorized implementation of the Viterbi algorithm for profile HMM alignment and introduced various other speed-ups. This accelerated HHsearch by a factor 4 and HHblits by a factor 2 over the previous version 2.0.16. HHblits3 is ~10x faster than PSI-BLAST and ~20x faster than HMMER3. Jobs to perform HHsearch and HHblits searches with many query profile HMMs can be parallelized over cores and over servers in a cluster using OpenMP and message passing interface (MPI). The free, open-source, GNU GPL(v3)-licensed software is available at https://github.com/soedinglab/hh-suite. Conclusion: The added functionalities and increased speed of HHsearch and HHblits should facilitate their use in large-scale protein structure and function prediction, e.g. in metagenomics and genomics projects.
10.【Evolution】美国范德堡大学学者25种出芽酵母基因组分析发现演化早期的重要基因丢失
Extensive loss of cell cycle and DNA repair genes in an ancient lineage of bipolar budding yeasts(CC-BY-NC 4.0)
Cell cycle checkpoints and DNA repair processes protect organisms from potentially lethal mutational damage. Compared to other budding yeasts in the subphylum Saccharomycotina, we noticed that a lineage in the genus Hanseniaspora exhibited very high evolutionary rates, low GC content, small genome sizes, and lower gene numbers. To better understand Hanseniaspora evolution, we analyzed 25 genomes, including 11 newly sequenced, representing 18 / 21 known species in the genus. Our phylogenomic analyses identify two Hanseniaspora lineages, the fast-evolving lineage (FEL), which began diversifying ~87 million years ago (mya), and the slow-evolving lineage (SEL), which began diversifying ~54 mya. Remarkably, both lineages lost genes associated with the cell cycle and genome integrity, but these losses were greater in the FEL. For example, all species lost the cell cycle regulator WHI5, and the FEL lost components of the spindle checkpoint pathway (e.g., MAD1, MAD2) and DNA damage checkpoint pathway (e.g., MEC3, RAD9). Similarly, both lineages lost genes involved in DNA repair pathways, including the DNA glycosylase gene MAG1, which is part of the base excision repair pathway, and the DNA photolyase gene PHR1, which is involved in pyrimidine dimer repair. Strikingly, the FEL lost 33 additional genes, including polymerases (i.e., POL4 and POL32) and telomere-associated genes (e.g., RIF1, RFA3, CDC13, PBP2). Echoing these losses, molecular evolutionary analyses reveal that, compared to the SEL, the FEL stem lineage underwent a burst of accelerated evolution, which resulted in greater mutational loads, homopolymer instabilities, and higher fractions of mutations associated with the common endogenously damaged base, 8-oxoguanine. We conclude that Hanseniaspora is an ancient lineage that has diversified and thrived, despite lacking many otherwise highly conserved cell cycle and genome integrity genes and pathways, and may represent a novel system for studying cellular life without them.
引文
1. Abdill Rhichard and Blekhman Ran, 2018, Tracking the popularity and outcomes of all bioRxiv preprints. bioRxiv.
3. Nature自然科研 30000+,百万月下载量:蓬勃生长的预印本
更多生信分析需求,请联系电话(同微信号):13120220117